Speech recognition performance comparison between DSR and AMR transcoded speech
نویسندگان
چکیده
In this paper the speech recognition performance obtained when using Distributed Speech Recognition (DSR) architecture is compared to that obtained when the speech is first transcoded using the Adaptive Multi-Rate (AMR) speech codec at 4.75 and 12.2 kbps. In a like-versus-like comparison, made using the Advanced DSR Front-end and the Aurora reference back-end, the DSR architecture gives substantial gains in speech recognition performance. The evaluations measure the change in Word Error Rate (WER) on the Aurora 2 and Aurora 3 databases with “perfect” endpoints. The performance with AMR 4.75 is 50% worse than DSR on Aurora 2 and 47% worse on Aurora 3. Even with the higher data rate of AMR 12.2, AMR is 17% worse than DSR on Aurora 2 and 20% worse on Aurora 3.
منابع مشابه
Adaptive Multi - Rate Wir
Distributed speech recognition (DSR) is motivated by the fact that codecs used in speech transmission usually reveal a degrading voice quality below some channel quality (carrier-to-interferer ratio C/I), which justifies efficient coding of features with an appropriate channel coding in the mobile terminal. The Adaptive MultiRate (AMR) speech codec standardized for GSM and UMTS however delivers...
متن کاملStatistical Tests for Voice Activity Detection
A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order autocumulants. This algorithm differs from many others in the way the decisi...
متن کاملBispectra Analysis-Based VAD for Robust Speech Recognition
A robust and effective voice activity detection (VAD) algorithm is proposed for improving speech recognition performance in noisy environments. The approach is based on filtering the input channel to avoid high energy noisy components and then the determination of the speech/non-speech bispectra by means of third order autocumulants. This algorithm differs from many others in the way the decisi...
متن کاملBispectrum-Based Statistical Tests for VAD
In this paper we propose a voice activity detection (VAD) algorithm for improving speech recognition performance in noisy environments. The approach is based on statistical tests applied to multiple observation window based on the determination of the speech/non-speech bispectra by means of third order auto-cumulants. This algorithm differs from many others in the way the decision rule is formu...
متن کاملIndependent Component Analysis Applied to Voice Activity Detection
In this paper we present the first application of Independent Component Analysis (ICA) to Voice Activity Detection (VAD). The accuracy of a multiple observation-likelihood ratio test (MO-LRT) VAD is improved by transforming the set of observations to a new set of independent components. Clear improvements in speech/non-speech discrimination accuracy for low false alarm rate demonstrate the effe...
متن کامل